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ABSTRACT 



An architecture and method for booting a multi-proces- 
sor system having processor local memory and shared 
global memory, with shared global memory access 
managed by an atomic memory access controller and 
cache coherence managed by software. Reset circuits 
are used to synchronize to a master clock a commonly 
distributed start signal and processor individualized' 
restart sequences, which reset circuit signals are distrib- 
uted to reset both local and global memory. Global 
memory testing is assigned to a processor based upon its 
rate status in completing an internal test sequence. The 
systems and methods are particularly suited to booting 
a group of multiple but relatively independent proces- 
sors. Furthermore, the practice of the invention facili- 
tates booting of such system when one or more of the 
processors have been disconnected or failed. 

11 Claims, 5 Drawing Sheets 
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APPARATUS AND METHOD FOR BOOTING A 
MULTIPLE PROCESSOR SYSTEM HAVING A 
GLOBAL/LOCAL MEMORY ARCHITECTURE 

5 

CROSS-REFERENCE TO RELATED 
APPLICATIONS 

The present invention is related to U.S. Pat. No. 
5,327,548 having common inventorship and assignee. 

BACKGROUND OF THE INVENTION 

The present invention relates generally to multiple 
processor computer systems. More particularly, the 
invention is directed to systems and methods for boo- 
ting/starling/restarting/resetting a multiple processor 15 
system characterized by the presence of a shared global 
memory and a multiplicity of relatively independently 
operable processors having individualized resetting and 
booting resources. 

Systems composed of multiple but coordinated pro- 20 
cessors were first developed and used in the context of 
mainframes. More recently, interest in multiple proces- 
sor systems has escalated as a consequence of the low 
cost and high performance of microprocessors, with the 
objective of replicating mainframe performance 25 
through the parallel use of multiple microprocessors. 

A variety of architectures have been defined for mul- 
ti-processor systems. Most designs rely upon highly 
integrated architectures by virtue of the need for cache 
coherence. In such systems cache coherence is main- 30 
tained through complex logic circuit interconnection of 
the cache memories associated with the individual mi- 
croprocessors to ensure data consistency as reflected in 
the various caches and main memory! 

A somewhat different approach to architecting a 35 
multi-processor system relies upon a relatively loose 
hardware level coupling of the individual processors, 
with the singular exception of circuit logic controlling 
access to the shared global memory, and the use of 
software to manage cache coherency. An architecture 40 
which relies upon software managed cache coherency 
allows the designer to utilize existing processor hard- 
ware to the maximum extent, including the utilization of 
the processor hardware integrated booting/starting/re- 
starting/resetting resources. This independence of the 45 
processors also lends itself to multi-processor systems 
with accentuated levels of availability, in that such 
independence facilitates continuity of system operation 
in the presence of failures or removals of one or more 
processors. Coordination in the access to, and coher- 50 
ency with, a shared global memory is of course some- 
what more difficult with such independence of proces- 
sors. 

A fundamental problem which arises with such indi- 
vidualized processor multi-processor systems involves 55 
the coordination to accomplish system wide booting. 
Not only are the multiple processors designed and con- 
figured to accomplish individualized starting, but such 
starting must also incorporate the effects of an asyn- 
chronous common start signal. The asynchronous sig- 60 
nal is usually derived from the status of the power sup- 
ply. The multi-processor system must also have re- 
sources to synchronize the processors undergoing indi- 
vidualized starting to a master clock, and devices and 
methods to insure initialization and testing of all the 65 
processor as well as the shared global memory. Accom- 
plishing this in the face of a failure in one or more of the 
processors complicates the management of the booting 



2 

operation, in that booting responsibilities cannot be 
permanently allocated to selected ones of the proces- 
sors. 

SUMMARY OF THE INVENTION 

The present invention defines a multiple processor 
architecture and method of operation in which a plural- 
ity of processors having individual starting resources 
respond to a common start signal and master clock to 
boot not only their individualized processor resources 
but the system level global memory. Furthermore, the 
objectives are attained in the context of an architecture 
which boots notwithstanding a failure in or absence of 
one or more of the individualized processors. 

In one form, the present invention involves apparatus 
for booting a multiple processor system having both 
processor local and shared global memory, wherein the 
processors have individualized means for starting, the 
system includes a means for generating a common start- 
ing signal to all processors, the system includes a master 
clock means for synchronizing the multiple processors, 
and the system includes a means for testing the local and 
global memory in synchronism with the master clock 
means and responsive to the common start signal. In 
another aspect, the invention is directed to methods 
which perform the steps defined by the apparatus. 

A preferred embodiment of the invention involves a 
multiplicity of processors responsive to individualized 
off-chip sequencers for processor starting and testing. 
Each processor has a local memory and access to 
shared global memory through a non-blocking cross- 
point switch. Access to global memory is coordinated 
through an atomic memory access controller, while 
cache coherence is managed through software. A com- 
mon start signal is generated in response to the status of 
a shared power supply and is synchronized to a master 
clock through reset circuits. The reset circuits also 
synchronize and coordinate the reset of the local and 
global memory. Testing of the global memory is accom- 
plished by the First of the processors which reaches a 
defined state in a program load sequence, which status 
provides such first processor with access to the global 
memory while isolating other processors from such 
access. 

The benefits and features of the architecture and 
methods to which the present invention pertains will be 
more clearly understood and appreciated upon consid- 
ering the ensuing description of a detailed embodiment. 

BRIEF DESCRIPTION OF THE DRAWINGS 

FIGS. 1A and IB are schematic block diagram of a 
multi-processor system. 

FIG. 2 is a block diagram showing the circuit func- 
tions embodied in a reset circuit. 

FIGS. 3 and 4 illustrate by waveforms the method by 
which the circuit operates to accomplish resetting and 
testing for two different hardware starting conditions. 

BRIEF DESCRIPTION OF HE PREFERRED 
EMBODIMENT 

FIGS. 1A and IB illustrate by schematic block dia- 
gram an architecture for the multi-processor system to 
which the present invention pertains. Included within 
the system are four processors, identified by reference 
numerals 1-4. A representative example of a processor 
is the RISC System/6000 workstation with associated 
ADC Operating System as is commercially available 
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from IBM Corporation. Each processor includes an memory bus extending to both the local and the global 
off-chip sequencer (OCS) 6 and clock 7, which together memory. Memory isolators 26 are used to selectively 
cycle the related processor through a sequence of reset decouple the off-chip sequencing activities of the three 
and test conditions in anticipation of commencing the processors which are not performing the global mem- 
initial program load (IPL) to boot the operating system. 5 ory test. This avoids extraneous memory bus activity 
As conventionally practiced, once the off-chip se- from reaching the global memory while the global 
quencer 6 completes its cycles, the initial program load memory is being tested by the single selected processor. 
(IPL) ROM S is accessed to commence the booting of The multi-processor boot operation of the embodying 
the operating system from non-volatile storage such as system in FIGS. 1A and IB begins with an asynchro- 
hard disk (not shown). The multi-processor system in 10 nously generated power good signal on line 14. The 
FIGS. 1A and IB shows the presence of multiple such asynchronous power good signal initiates a multiplicity 
processors and their individually related starting sys- of asynchronous and individually clocked reset signals 
terns. Associated with each processor 1-4 is a respective in the off-chip sequencers individually associated with 
and locally addressable memory block, identified by each processor. The reset signals emanating from such 
reference numerals 9, 11, 12 and 13. Though not explic- 15 off-chip sequencers are synchronized to the master 
itly shown, each processor also includes a cache type clock signal on line 16 in reset circuits 21 and 22, 
memory for both instructions and data. As noted earlier, wherein reset circuit 21 accomplishes the reset synchro- 
cache coherency is managed by software in a manner to nization for processors 1 and 2 while reset circuit 22 
be described hereinafter. does likewise for processors 3 and 4, The clock syn- 

The creation of a multi-processor system from a mul- 20 chronized reset signals are conveyed to the respective 

tiplicity of individualized processor systems, including processors. Also emanating from reset circuits 21 and 22 

their related starting and memory resources, introduces are clock synchronized reset signals directed to the 

the need for the other elements. For example, a power respective local memories 9, 11, 12 and 13, as well as the 

good signal on line 14 is distributed to all processor respective portions of global memory array 18. In this 

off-chip sequencers to initiate the start sequence. As 25 way, the processors retain substantially independent 

would be expected, the power good signal on line 14 is booting or starting resources yet are synchronized to a 

asynchronous to the master clock signal on line 16. common power good type start signal and individual- 

Therefore, the initiation of the off-chip sequencers and ized reset signals using a master clock, 

master clock do not coincident. This synchronization Each off-chip sequencer cycles through multiple 

problem is further exacerbated by the fact that the em- 30 states during the course of testing its associated proces- 

bodying off-chip sequencers 6 are often synchronized to sor. Included within those states are multiple reset cy- 

their own clocks 7. cles which are again synchronized through reset cir- 

Further aspects of the multi-processor system reside cuits 21 and 22. Each off-chip sequencer concludes with 
in atomic semaphores 17 of atomic controller 15, which the loading of the initial program load code from ROM 
allow software to coordinate accesses to the global 35 8, which code then initiates the loading of the operating 
memory array, generally at 18. The atomic semaphore system. As embodied, the initial program load code 
controller uses lockable semaphore type registers. The includes an instruction which directs the processor to 
atomic semaphore controller will only allow one pro- read the data in counter 19 of atomic controller 15. 
cessor at a time to acquire exclusive access to a sema- Counter 19 is initialized to zero at power up and is 
phore register. However, different processors may own 40 incremented after each processor read. Reads by suc- 
different semaphores at the same time, and each proces- cessive processors are serialized, so that no two read the 
sor may own more than one semaphore 'at a time. Soft- same value. The 0 value identifies to the recipient pro- 
ware uses the semaphores to select which processors cessor that it is to test not only its own local memory 
can access different blocks of global memory. Software but also the whole of the global memory. In contrast, 
also uses cache flush cycles to maintain global memory 45 processors reading non-zero values are directed to test 
coherence with respective processor caches. Atomic only their respective processor local memories. The 
counter 19 is for purposes of the multi-processor system requirement that only one processor test the global 
boot, used to select the processor which tests global memory during an interval of time is obviously impor- 
memory array 18 for defects and the like. tant, while the selection of the first processor to read the 

Non-blocking cross-point switch 23 uses a relatively 50 counter draws upon practical considerations. Namely, 

conventional design to allow processor 1-4 direct ac- since the off-chip sequencers are not synchronized, 

cess to the whole of global memory array 18 in the predicting which processor will commence IPL first is 

absence of any address contentions. The processors are not practical. Furthermore, if one or more processors 

thereby able to concurrently communicate with the are inoperative, the original design goal requires that 

global memory. 55 the system boot operation still be completed and that 

The management of the boot operation for the multi- only a single processor will undertake to test the global 

processor system in FIGS. 1A and IB is accomplished memory. 

through two reset circuits, reset circuit RC1 at refer- In reflection, it should be apparent that this system 

ence numeral 21 and reset circuit RC2 at reference defines an architecture and related method of operation 

numeral 22. Though a single reset circuit would nor- 60 whereby booting of the system is accomplished without 

mally suffice, the embodiment utilizes two because of regard to the asynchronous stature of the initiation 

physical chip size constraints and to minimize timing signal, the asynchronous stature of processor individu- 

skews within the system. Reset circuits 21 and 22 are alized starting sequences, and without regard to the 

cross-coupled to ensure that both parts of global mem- presence or absence of selected processors. For exam- 

ory array, namely banks 0-3 and 4-7, are reset at sub- 65 pie, if the processor and related resources defined by 

stantially identical times. dashed block 27 in FIG. 1 were inoperative, the remain- 

FIGS. 1A and IB also show the presence of memory ing three processors would boot into a fully operative 

isolators 26, interposed between each processor and the system in the normal manner. 
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FIG. 2 schematically illustrates the logic internal to 3. The apparatus recited in claim 2, wherein the gen- 

an embodying reset circuit. The various blocks are erated start sequence signals step each processor 

identified by function. The master clock and power through a multiple stage individually clocked start se- 

good related power on reset (POR) signals are shown quence. 

together with the inputs and outputs. The source, desti- 5 4. The apparatus recited in claim 3, wherein the 
nation, and character of the input and output signals are means for the selected processor to test the global mem- 
defined in the headings of FIGS. 3 and 4. The timing ory synchronizes the global memory reset and the 
relationships of the various signals are depicted in global memory test signals to the master clock. 
FIGS. 3 and 4. FIG. 3 illustrates the waveforms when 5. The apparatus recited in claim 1, further compris- 
the hardware reset signal starts at a low level. On the 10 ing: 

other hand, FIG. 4 illustrates the states of the various means for selecting one processor to test the global 

signals when the hardware reset starts in a high state. memory. 

The architecture and method of operation defined by 6. The apparatus recited in claim 5, further compris- 

the present invention not only synchronize various ing: 

asynchronously occurring boot type signals for a multi- 15 means for decoupling nonselected processors from 

processor system having processor local and shared global memory during the test of global memory, 

global memory, but accomplishes these objectives with 7. A method of booting a complete multi-processor 

extenuated flexibility. Namely, booting is accomplished system having individual processors, individual proces- 

with processors having independent starting sequenc- sor local memory, and a global, memory comprising the 

ers, provides master clock synchronized local and 20 steps of: 

global memory reset, and defines a process for selecting distributing a common power on signal to each pro- 

a processor to test the global memory. Foremost, these cesser; 

objectives are attainable with one or more processors starting each processor individually responsive to the 

disconnected. power on signal; 

Though the invention has been described and illus- 25 synchronizing any steps which individually start each 

trated by way of a specific embodiment, the systems and processor to a master clock; 

methods encompassed by the invention should be inter- consistently selecting at least one processor to test 

preted consistent with the breadth of the claims set both the global memory and such selected proces- 

forth hereinafter. sor's local memory; and 

What is claimed is: 30 testing the global memory by the selected processor 

1. Apparatus for booting a complete multi-processor in synchronism with the master clock. 

system having individual processors, individual proces- 8. The method recited in claim 7, wherein the step of 

sor local memory, and a global memory, comprising: starting each processor individually involves a generat- 

means for distributing a common power on signal to ing of an asynchronous start sequence for each proces- 

each processor; 35 sor. 

means for individually starting each processor re- 9. The method recited in claim 8, wherein the gener- 

sponsive to the power on signal; ated asynchronous start sequence signals step each pro- 
means for synchronizing any steps which individually cessor through a multiple stage individually clocked 

start each processor to a master clock; start sequence, 

means for consistently selecting at least one processor 40 10. The method recited in claim 9, wherein the step of 

to test both the global memory and such selected testing the global memory by a processor comprises 

processor's local memory; and synchronizing the global memory test signals and reset 

means for the selected processor to test the global signals to the master clock. 

memory in synchronism with the master clock. 11. The method recited in claim 7, further comprising 

2. The apparatus recited in claim 1, wherein the 45 the step of: 

means for individually starting generates start sequence selecting one processor to test the global memory, 

signals for each processor asynchronously. ***** 
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